Dynamically programmable integrated switching device using an asymmetric 5T1C cell

ABSTRACT

A switching element including first, second and third ports each comprising a plurality of lines is disclosed. A first memory cell includes a storage element, a first pass gate for selectively coupling a first line of the first port to the storage element, a second pass gate for selectively coupling a first line of the second port to the storage element, and a third pass gate for selectively coupling a first line of the third port to the storage element. A second memory cell includes a storage element, a first gate for selectively coupling a second line of the first port to the storage element, a second pass gate for selectively coupling a second line of the second port to the storage element, and a third pass gate for selectively coupling a second line of the third port to the storage element.

FIELD OF INVENTION

[0001] The present invention relates in general to electronic switchingdevices and elements and in particular to dynamically programmableintegrated switching devices suitable for use in high speed routing andswitching applications.

BACKGROUND OF INVENTION

[0002] In networked systems, the interconnect or the core switch fabricconnecting the various system element, essentially attempts to connect Ninputs to M outputs for the maximum number of possible routes. The“Non-Blocking” nature of the interconnection or the availability of“Clear Channels” enables the switch fabric to route or switch individualdata packets.

[0003] In one interconnection architecture, the core switch fabric isbased on time-domain multiple access (TDMA) to a common backplane or ashared bus. A controller, together with software, acts as the bus masterand implements the routing kernel. The routing kernel is usuallyimplemented in an algorithm such as a Hierarchical Weighted Fair Queuingalgorithm.

[0004] Alternatively, the core switch fabric may be based on single ormultiple crossbar integrated circuits. In this case, the controllerasserts appropriate read and write commands to the crossbar and controlsthe exchange of data with a set of input and output buffers, typicallyconstructed from common memory elements such as DRAM and SRAM. Switchesare then built by using multiple cards which connect to the multipleinput and output ports of the crossbar with a non-blocking switchfabric.

[0005] In any event, the devices interfacing with the switch fabric arereaching higher and higher speeds. This in turn requires higherthroughput rate through the switch fabric itself. Existing systemscalculate aggregate throughput, in bits per second, by taking thethroughput in bits per second for one port of the switch fabric andmultiplying it by the total number of input and output ports. Thisaggregate capacity can be increased by varying the number of input andoutput ports on the switch fabric, the speed of operation of the switchfabric and the efficiency of the network processor. Notwithstanding,device physics and the electrical characteristics of busses andinterconnects are still significant limiting factors on throughputspeed.

[0006] Consequently, a switch element is required, which takenindividually or in conjunction with other elements of a similar type,enables the design and fabrication of high speed scalable switchfabrics.

SUMMARY OF INVENTION

[0007] According to one embodiment of the principles of the presentinvention, a switching element is disclosed which includes first, secondand third ports each comprising a plurality of lines. A first memorycell includes a storage element, a first pass gate for selectivelycoupling a first line of the first port to the storage element, a secondpass gate for selectively coupling a first line of the second port tothe storage element, and a third pass gate for selectively coupling afirst line of the third port to the storage element. The switchingelement also includes a second memory cell having a first pass gate forselectively coupling a second line of the first port to the storageelement, a second pass gate for selectively coupling a second line ofthe second port to the storage element, and a third pass gate forselectively coupling a second line of the third port to the storageelement.

[0008] Switching elements, switches and switching subsystems embodyingthe principles of the present invention enable the design andfabrication of high speed scalable switch fabrics. Such high-speedswitch fabrics are particularly useful in network switches and routers,although not necessarily limited thereto.

BRIEF DESCRIPTION OF DRAWINGS

[0009]FIG. 1A is conceptual block diagram of a router or a switch;

[0010]FIG. 1B is the logical block diagram of a switch router with inputand output queues;

[0011]FIG. 2A is the functional block diagram of a router designed withforwarding engines;

[0012]FIG. 2B is the functional block diagram of a router designed withInterfaces and a core switch fabric;

[0013]FIG. 3 is the general architectural block diagram of a typicalInput Output Interface;

[0014]FIG. 4A is the general logical block diagram of a Broadcast SwitchElement (BSE);

[0015]FIG. 4B is the general logical block diagram of a Receive SwitchElement (RSE);

[0016]FIG. 5A is the circuitry diagram of a Broadcast Switch Elementimplemented using a 5T1C cell;

[0017]FIG. 5B is the timing diagram of a read-write cycle for a BSE;

[0018]FIG. 6A is the circuitry diagram of a Receive Switch Elementimplemented using a 5T1C cell;

[0019]FIG. 6B is the timing diagram of a read write cycle for a RSE;

[0020]FIG. 7 is the circuitry diagram of a port block formed by a 5T1CBSE;

[0021]FIG. 8 is the functional block diagram of a port block formed by5T1C BSE;

[0022]FIG. 9 is the architecture of a switching device formed with byport blocks implemented with 5T1C;

[0023]FIG. 10 Is the block diagram of a row within a switching deviceemphasizing the column decode;

[0024]FIG. 11A is the functional block diagrams of the write decodeblock of FIG. 9;

[0025]FIG. 11B is a possible implementation of a decode from prior art;and

[0026]FIG. 12 is the functional block diagram of a read decode block.

DESCRIPTION OF THE PREFERRED EMBODIMENT

[0027] A conceptual diagram of a switch /routing system architecture 100is shown in FIGS. 1A and 1B. Switch fabric 101 in conjunction with theI/O of switch /router 100 can be visualized as a number of input andoutput queues 102, 103 by non-blocking interconnections 104.Interconnections 104 may be for example single or multiple stagecrossbars or a backplane. The input and output queues 102, 103 aretypically disposed on the I/O port cards 106 a,f. System controller 105implements a queuing/de-queuing algorithm (kernel), and generallycontrols the core switch fabric under software and firmware control.

[0028] Exemplary router architectures based on the current generation ofnetwork processors are shown in FIG. 2A and FIG. 2B. In the system ofFIG. 2A, the processing power is in the hardware and software offorwarding engines 201. With respect to the system of FIG. 2B, theprocessing power is in the systems interfaces 202, including thescheduling and system control functions. Specifically, the maindifference between the architectures shown in FIGS. 2A and 2B is wherethe actual forwarding table resides (in FIG. 2A in the forwardingengines and in FIG. 2B in the system interfaces). These route tables canbe represented by data structures generated by the network processor andare stored in the system memory.

[0029] With respect to FIG. 3, a selected I/O interface 202 is modeledby the general structure shown. The forwarding engine is a firmwareimplementation of the algorithms. Memory buffers 301, 302 logically actas the input/output queues in the system. These memory buffers adddelays to the whole process of taking a packet from the physical inputport of the router (PHY Receive) to the physical output port (PHYTransmit) with the appropriate header/routing information.

[0030]FIG. 4A depicts a Broadcast Switch Element, (BSMnm—taken from thenth row and mth column of the switch architecture discussed below). 401,logically represented by a 1×K de-multiplexer having one input port(iBnm) and K output ports (0Bnmk A Receive Switch Element (RSE) 402 islogically represented by a K×1 multiplexer in FIG. 4B and has K input(iRnmk) ports and one output port (0Rnm).

[0031] According to the principles of the present invention, a 1×4 BSE401 is implemented by a 5T1C (5 transistor, 1 capacitor) dynamic memorycell shown in FIG. 5. The input port (gate) 501, labeled iBnm, andoutput ports 502 a,d, labeled 0Bnm1 to 0Bnm4 are formed by metal oxidesemiconductor field effect transistors (MOSFETs). Specifically, thefirst output port is formed by the output transistor 502 a, the secondoutput port is formed by the transistor 502 b, the third output port isformed by the transistor 502 c and the final and fourth output port isformed by the transistor 502 d. Each 5T1C cell has a single storageelement represented by the capacitor 503.

[0032] Exemplary read and write cycles for BSE 401 element are shown inFIG. 5B, where the appropriate gates are turned-on as indicated by theassertion of the read and write enables. In the write cycle, the inputgate 501 is turned-on with the signal WRITE ENABLE WE and the storagecapacitor 503 is allowed to charge to a level proportionate to the inputgate transistor drive. The voltage across the storage capacitor is afunction of the current and the charging time is dictated by the timeconstant.

[0033] Data written into the storage capacitor can be read out byselectively turning on the output transistors 502 a,d eitherindividually, all at once, or in some other combination, by selectingthe corresponding READ ENABLE signal RE1-RE4. In particular, if the portblock, described below, to which the specific BSE belongs, is beingemployed for a multicast session then all the output gates can be turnedon simultaneously. Otherwise the gates are normally turned onindividually. To read from the storage element simultaneously with awrite, a feedback mechanism external to the basic switch element retainsthe data and writes them back into the storage capacitor 503 in an offcycle.

[0034] The inventive concepts can also be also be applied to RSE core402 as shown in FIG. 6A. Here, the RSE core is implemented with gatetransistors 601 a,d forming the input ports and the gate transistor 602forming the output port. The storage element is again represented by acapacitor, in this case capacitor 603. It should be understood that at agiven time during the operation of the RSE only one input port 601 maybe used to write data into the storage element represented by thestorage capacitor 603.

[0035] In case of an RSE, the operation is the reverse of the operationof the BSE, as shown in FIG. 6B. In the first cycle, data can be writteninto storage capacitor 603, by the use of any one of the input portgates 601 a,d and the WRITE ENABLE signals WE1-WE4. When multi-valuedstorage systems are possible using a single storage element, all fourgates can be used concurrently to store multiple values into the storagecapacitor 602. Data can be read from the output port gate 602simultaneous with a write, if an external feedback mechanism is providedexternal to the core RSE switch element.

[0036] With respect to FIG. 7, a port block 700 that is P bits wide iscreated using P number of 5T1C BSEs 500. All the input ports of the Pnumber BSEs 500 are taken together to form the input port I_(PB) ^(NM)of the port block 700 with each input controlled by a corrrespondingwrite enable signal WE1-WE4. Controlled by a corresponding write enablesignal WE1-WE4. The illustrated port block has 4 output ports O_(PB)^(NM1) -O_(PB) ^(NM4). The first output line of the first output of eachBSE 500 are tied together to form the output port 1 O_(PB) ^(NM1). In asimilar fashion, output port 2 O_(PB) ^(NM2) of the port block is formedby taking all the second output ports of each of the BSE 500 together,and so on such that, each of the output ports are formed in a linearfashion. Other nonlinear combination of BSE can be used to form a portblock. FIG. 8 shows the interface diagram of the port block.

[0037] A switch matrix of size N×M, within the DIPS device (900), isformed by port blocks 700 arranged in rows and columns as shown in FIG.9. (It is not necessary that the individual port blocks are arranged ina row column fashion and interconnected in a matrix format.) In additionto the matrix of port blocks 700, DIPs device 900 also includes WriteDecode and Read Decode blocks 901, 902, Lookup Decode 903 and controls904.

[0038] With respect to FIG. 10, each row N of port blocks has one P-bitwide input I_(N), this input feeds into a 1 to M input demux (1001).This de-mux is a form of decode and essentially is part of the writedecode block 901. Demux 1001 is preferably of a conventional design,using combinational circuits such as cascaded, domino etc. Based on thedecode code given to the decode circuit, the input data on the inputport IN is sent to the appropriate port block in the row. For each row Nin the DIPS device there is one input de-mux 1001, that allows one inputto be tied to each of the inputs of the M port blocks in a row.

[0039] When each port block comprises 4 5TIC memory cells, each row ofport blocks 700 has four outputs O_(N1)-O_(N4) that are each P bitswide, each coupled through an output mux (1002). Each output mux (1002)is a M to 1 mux. Preferably, each of the P-bit wide outputs of the portblocks are tied to the output muxes 1002 as follows; the first outputO_(PB) ^(NM1) of each of the M port blocks 700 in the row, first outputmux 1002 a, the second output O_(PB) ^(NM2) of each port 700 blocks isan input to the second output mux 1002 b and so on for all the fouroutputs.

[0040] Output muxes 1002 are part of read decode block 902 in the DIPSdevice 900. Each of output muxes are formed by combinatorial circuitsand implement a 1 of M decode. The outputs of each of these muxes aresent to an I/O block that is part of the controller (904) for the DIPSdevice. DIPS device 900 has a single output through the output port ofthe device which is P bits wide. DIPS device 900 also includes a singleinput port that is also P bits wide. These constraints are placed on theDIPS device due to semiconductor packaging limitations.

[0041]FIGS. 11A and 11B are more detailed diagrams of Write Decode block901. The output of input mux 1001 is sent to a write drive block (1101)that ties into the input gate of each BSE. FIG. 12 is a more detaileddiagram of Read Decode block 902. Each of the output gates of the BSEtie into an amplification block (1201), that is formed by a differentialamplifier as shown. The outputs of the differential amplifier drive theinputs to the combinatorial output mux 1002. Within the port block, areference cell can be used to drive the differential inputs to theamplifier 1201 or a shadow, 5T1C cell that is used for redundancy can beused, to drive the reference input to the differential amplifier.

[0042] If the shadow 5T1C cell is used then each of the port blocksforms a mirrored memory element and switch. The use of the mirroredmemory element and switching device can be used to control errors inreading or writing. This implements a pseudo cache.

[0043] With respect to FIG. 12, the write decode block is implemented toform a 1 to M decode for each port block. A control input that is Log₂(NM) bits wide is decoded into the appropriate port block address withinthe row. A simple decode scheme is shown in this embodiment. It shouldbe clear to those of ordinary skill in the art, decode can be changedwithout departing from the spirit of the invention.

[0044] The operation of DIPS 900 device can be summarized as follows:

[0045] 1) An external Switch Controller asserts the appropriate read andwrite signals to the DIPS device that is part of the Switch fabricmatrix.

[0046] 2) The reads and write signals are decoded for the assertion ofthe reads and writes to the port blocks internally within the DIPS (900)device by the controll (904).

[0047] 3) The reads and writes are decoded by the read-decode blocks andthe write-decode blocks within the DIPS device.

[0048] 4) The write and reads are done asynchronously and in the sameclock cycle, thus in a given clock cycle at the minimum, using a simplelinear decode one can access two port blocks.

[0049] The throughput thus of a DIPS device based on the aforementionedprotocol followed by the read and write cycles, is 2 * Pbits * Speed inMhz of the DIPS device. Thus for a 100 Mhz DIPS device with a port blockthat is 64 bits wide the throughput of a DIPS device is=2 * 64 * 100Mhz=12.8 Gbps for a DIPS device. For a fabric implemented by usingmultiple DIPS devices throughput is # DIPS device * 12.8 Gbps per DIPSdevice.

[0050] A similar implementation of the DIPS device can be done using theRSE. While a particular embodiment of the invention has been shown anddescribed, changes and modifications may be made therein withoutdeparting from the invention in its broader aspects, and, therefore, theaim in the appended claims is to cover all such changes andmodifications as fall within the true spirit and scope of the invention.

What is claimed is:
 1. A switching element comprising: a first portcomprising a plurality of lines; a second port comprising a plurality oflines; a third port comprising a plurality of lines; a first memory cellincluding a storage element, a first pass gate for selectively couplinga first line of said first port to said storage element, a second passgate for selectively coupling a first line of said second port to saidstorage element, and a third pass gate for selectively coupling a firstline of said third port to said storage element; and a second memorycell including a storage element, a first pass gate for selectivelycoupling a second line of said first port to said storage element, asecond pass gate for selectively coupling a second line of said secondport to said storage element, and a third pass gate for selectivelycoupling a second line of said third port to said storage element. 2.The switching element of claim 1 wherein the first and second portscomprise output ports for reading data from said storage elements ofsaid memory cells and said third port comprises an input port forwriting data to said storage elements of said memory cells.
 3. Theswitching element of claim 1 wherein the first and second ports compriseinput ports for reading data from said storage elements of said memorycells and said third port comprises an output port for writing data tosaid storage elements of said memory cells.
 4. The switching element ofclaim 1 wherein said storage elements comprise capacitors.
 5. Theswitching element of claim 1 wherein said pass gates comprise fieldeffect transistors.
 6. A switch comprising: a plurality of port blockseach comprising: a plurality of I/O ports; and a plurality of memorycells each including a first pass gate for coupling a selected line ofthe first port with a storage element and a second pass gate forcoupling a selected line of the second port with said storage element;read decoder circuitry for selecting one of said plurality of I/O portsof a first selected one of said plurality of port blocks and readingdata from a selected memory cell of said second selected port block; andwrite decoder circuitry for selecting one of said plurality of I/O portsof a second selected one of said plurality of port blocks and writingdata into a selected memory cell of said second selected port block. 7.The switch of claim 6 wherein said read decoder comprises ademultiplexer selectively coupling an output port to said selected oneof said plurality of I/O ports of said first selected port block.
 8. Theswitch of claim 6 wherein said read decoder comprises a multiplexerselectively coupling an input port with a selected one of said pluralityof I/O ports of said second selected port block.
 9. The switch of claim6 wherein said plurality of port blocks are organized as an array of Nnumber of rows and M number of columns, each of said port blocks havingK number of P-bit wide output ports and said read decoder comprises: foreach of said N number of rows, M number of K×1 multiplexers each forselecting one of K number of P-bit wide output ports from each of saidport blocks of said M number of columns; and an N×1 multiplexer forselecting one of N number of output ports selected by said M number ofK×1 multiplexers.
 10. The switch of claim 6 wherein said plurality ofport blocks are organized as an array of N number of rows and M numberof columns, each of said port blocks having K number of P-bit wide inputports, and said write decoder comprises: for each of N number of rows, a1×M demultiplexer for selecting an input port from a selected one ofsaid port blocks of said M number of columns: and a 1×N demultiplexerfor selecting between inputs from each of said 1×M demultiplexers. 11.The switch of claim 6 wherein said memory cells comprise dynamic randomaccess memory cells.
 12. The switch of claim 6 wherein each of saidplurality of memory cells of each said port block is coupled to aplurality of output I/O ports and an input I/O port.
 13. The switch ofclaim 6 wherein each of said plurality of memory cells of each said portblock is coupled to a plurality of input I/O ports and an output port.14. A switch comprising a plurality of port blocks organized in an arrayof N rows and M columns, each said port block comprising a first P-linewide port; a plurality of K number of P-line wide second ports; and aplurality of P number of memory cells each having a first pass gate forselectively coupling said cell to a corresponding one of said lines ofsaid P-line wide first port and a plurality of K number of second passgates each for selectively coupling said cell to a corresponding line ofsaid P-line wide second ports; first decoder circuitry comprising: foreach of said N number of rows, M number of K×1 multiplexers each forselecting one of said K number of P-bit wide second ports from each ofsaid port blocks of said M number of columns; and an N×1 multiplexer forselecting one of N number of ports selected by said M number of K×1multiplexers; and second decoder circuitry comprising: for each of saidN number of rows, a 1×M demultiplexer for selecting one of said firstports of said port blocks of said M number of columns: and a 1×Ndemultiplexer for selecting one of N number of first ports selected bysaid 1×M demultiplexer.
 15. The switch of claim 14 wherein said firstport of said port blocks comprises an input port and said second decodercomprises a read decoder.
 16. The switch of claim 14 wherein saidplurality of second ports of said port locks comprise output ports andsaid first decoder comprises a write decoder.
 17. The switch of claim 14wherein said first port of a selected one of said port blocks comprisesan output port and said plurality of second ports of said selected oneof said port blocks comprise output ports.
 18. The switch of claim 14wherein said first and second pass gates of a selected one of saidmemory cells of a selected one of said port blocks selectively couplesaid first and second ports to a storage capacitor.
 19. The switch ofclaim 14 wherein: a first one of said plurality of pass gates of a firstselected one of said memory cells of a selected port block couples saidfirst selected memory cell to a first line of a first one of said secondports and a second one of said plurality of pass gates couples saidfirst selected memory cell with a first line of a second one of saidsecond ports; and a first one of said plurality of pass gates of asecond selected one of said memory cells of said selected port blockcouples said second selected memory cell with a second line of saidfirst one of said second ports and a second one of said pass gates ofsaid second selected memory couples said second selected memory cellswith a second line of said second one of said second ports.
 20. Theswitch of claim 14 wherein said pass gates of said memory cells comprisetransistors.